Standards in Genomic Sciences (2014) 9:473-483 



DOI:10.4056/sigs.4828625 



Genome sequence of Ensifer arboris strain LMG 14919^; a 
microsymbiont of the legume Prosopis chilensis growing 
in Kosti^ Sudan 

1* 1 2 3 3 3 

Wayne Reeve , Rui Tian , Lambert Brau , Lynne Goodwin , Christine Munk , Chris Detter , 
Roxanne Tapia^ Cliff Han^, Konstantinos Liolios", Marcel Huntemann* Amrita Pati"*, Tanja 
Woyke", Konstantinos Mavrommatis^, Victor Markowitz^, Natalia Ivanova", Nikos Kyrpides"* 
& Anne Willems^. 

^ Centre for Rhizobium Studies, Murdoch University, Western Australia, Australia 
^ School of Life and Environmental Sciences, Deakin University, Victoria, Australia 
^ Los Alamos National Laboratory, Bioscience Division, Los Alamos, New Mexico, USA 
" DOE Joint Genome Institute, Walnut Creek, California, USA 

^ Biological Data Management and Technology Center, Lawrence Berkeley National 
Laboratory, Berkeley, California, USA 

Laboratory of Microbiology, Department of Biochemistry and Microbiology, Faculty of 
Sciences, Ghent University, Belgium 

*Correspondence: Wayne Reeve (W.Reeve@murdoch.edu.au) 

Keywords: root-nodule bacteria, nitrogen fixation, rh\zob\a, Alphaproteobacteria 

Ensifer arboris LMG 14919^ is an aerobic, motile. Gram-negative, non-spore-forming 
rod that can exist as a soil saprophyte or as a legume microsymbiont of several species 
of legume trees. LMG 14919^ was isolated in 1987 from a nodule recovered from the 
roots of the tree Prosopis chilensis growing in Kosti, Sudan. LMG 14919^ is highly ef- 
fective at fixing nitrogen with P. cliilensis (Chilean mesquite) and Acacia senega! (gum 
Arabic tree or gum acacia). LMG 14919^ does not nodulate the tree Leucena 
leucocepliala, nor the herbaceous species Macroptilium atropurpureum, Trifolium 
pratense, Medicago sativa, Lotus corn icu latu s and Calega orientalis. Here we describe 
the features of E. arboris LMG 14919^ together with genome sequence information 
and its annotation. The 6,850,303 bp high-quality-draft genome is arranged into 7 scaf- 
folds of 12 contigs containing 6,461 protein-coding genes and 84 RNA-only encoding 
genes, and is one of 100 rhizobial genomes sequenced as part of the DOE Joint Ge- 
nome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule 
Bacteria (GEBA-RNB) project. 



Introduction 

Legume plants form nitrogen fixing symbiosis 
with root nodule bacteria, collectively called rhi- 
zobia. These legumes are particularly useful crop 
plants that do not require exogenous nitrogenous 
fertilizer to support growth in less fertile, nitro- 
gen-deficient conditions. They include some of our 
staple food and feed plants such as beans, peas, 
soybeans, lentils, clover, peanuts and alfalfa and 
are mostly annual crops. In many arid and savan- 
nah regions, leguminous trees represent a 



particularly valuable resource as they are often 
deep-rooted and drought resistant. They have 
been used traditionally in the Sahel region as 
sources of timber, fodder and for soil improve- 
ment [1]. Prosopis chilensis, also known as Chilean 
mesquite, is a native tree from South America that 
has many uses: its nutritious pods can be ground 
to produce flour and are also eaten by livestock; 
its wood is used for construction and furniture. 
Chilean mesquite is also used for intercropping 
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with other plants, for which it provides shelter 
and nutrients [leaf compost, nitrogen). Acacia sen- 
ega! [recently renamed as Senegalia senega!) is a 
plant of particular importance in the production of 
gum arable in the Sahel region and the Middle 
East. Its seeds are dried for human consumption, 
and its leaves and pods serve as feed for sheep, 
goats and camels. The plant is also used in agro- 
forestry in intercropping with watermelon and 
grasses, and in rotation systems with other crops 
[Agroforestree Database [2]). 

The microsymbiont of these legume trees from 
Sudan and Kenya [3] has been renamed as Ensifer 
arboris [4], of which LMG 14919T [= HAMBI 1552, 
ORS 1755, TTR38) is the type strain. This strain 
was isolated from root nodules of Prosopis 
chilensis from Kosti, Sudan, and shown to effec- 
tively nodulate its original host as well as Acacia 
Senegal [5]. 

Given the drought tolerance of the host trees, it 
seems fitting that their symbionts are also stress 
resistant: Ensifer arboris was described as tolerant 
to temperatures up to 41-43 °C, 3% NaCl, several 
heavy metals [including Pb, Cd, Hg, Cu) and a wide 
range of antibiotics [3,5], characteristics that con- 
tribute to the success of the rhizobial-legume tree 
association in challenging environmental condi- 
tions [6]. Here we present a summary classifica- 
tion and a set of features for E. arboris strain LMG 
14919T [Table 1), together with the description of 
the complete genome sequence and its annotation. 

Classification and features 

E. arboris LMG 14919T is a motile, non- 
sporulating, non-encapsulated. Gram-negative rod 



in the order Rhizobiales of the class 
Alphaproteobacteria. The rod-shaped form varies 
in size with dimensions of approximately 0.25 |im 
in width and 1.0-1.5 |a.m in length [Figure 1, Left 
and Center). The strain is fast-growing, forming 
colonies within 3-4 days when grown on half 
strength Lupin Agar [VzLA) [19], tryptone-yeast 
extract agar [TY) [20] or a modified yeast- 
mannitol agar [YMA) [21] at 28°C. Colonies on 
1/2 LA are white- opaque, slightly domed and mod- 
erately mucoid with smooth margins [Figure 1 
Right). 

E. arboris LMG 14919T is capable of using several 
amino acids, including L-proline, L-arginine, sodi- 
um glutamate and L-histidine as sole nitrogen 
sources and can use a wide range of different car- 
bon sources including L-arabinose, D-galactose, 
raffinose, L-rhamnose, maltose, lactose, D- 
fructose, D-mannose, trehalose, D-ribose, xylene, 
methyl-D-mannoside, sorbitol, dulcitol, meso- 
inositol, inulin, dextrin, amygdalin, arbutin, sodi- 
um citrate, itaconate, a-ketoglutarate, sodium 
maltose, 1,2-propylene glycol, and 1,2-butylene 
glycol [5]. 

Minimum Information about the Genome Se- 
quence [MIGS) is provided in Table 1. Figure 2 
shows the phylogenetic neighborhood of E. arboris 
LMG 14919T in a 16S rRNA sequence based tree. 
This strain shares 99% [1361/1366 bp) and 99% 
[1361/1366 bp) sequence identity to the 16S 
rRNA of the fully sequenced E. meliloti Sml021 
[26] and E. medicae WSM419 [27] strains, respec- 
tively. 




Figure 1. \ir\ages of Ensifer arboris LMG 14919^ using scanning (Left) and transmission (Center) electron microscopy 
and the appearance of colony morphology on a solid medium (Right). 
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Table 1. Classification and general features of Ensifer arboris LMG 14919^ according to the 
MIGS recommendations [7] 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [8] 






Phylum Proteobacteria 


TAS [9] 






Class Alphaproteobacteria 


TAS [10,1 1] 




Current classification 


Order Rhizobiales 


TAS [11,12] 






Family Rhizobiaceae 


TAS [13,14] 






Genus Ensifer 


TAS [4,15,1 6] 






Species Ensifer arboris 


TAS [4] 






Strain LMG 14919^ 






Gram stain 


Negative 


IDA 




Cell shape 


Rod 


IDA 




Motility 


Motile 


IDA 




Sporulation 


Non-sporulating 


NAS 




Temperature range 


Mesophile 


NAS 




Optimum temperature 


28°C 


NAS 




Salinity 


Non-halophile 


NAS 


MIGS-22 


Oxygen requirement 


Aerobic 


TAS [3] 




Carbon source 


Varied 


TAS [5] 




Energy source 


Chemoorganotroph 


NAS 


MIGS-6 


Habitat 


Soil, root nodule, on host 


TAS [3,5] 


MIGS-15 


Biotic relationship 


Free living, symbiotic 


TAS [3,5] 


MIGS-14 


Pathogenicity 


Non-pathogenic 


NAS 




Biosafety level 


1 


TAS [17] 




Isolation 


Root nodule 


TAS [5] 


MIGS-4 


Geographic location 


Kosti, Sudan 


TAS [5] 


MIGS-5 


Soil collection date 


1987 


IDA 


MIGS-4. 1 


Longitude 


32.66342 


TAS [5] 


MIGS-4.2 


Latitude 


1 3.16125 


TAS [5] 


MIGS-4.3 


Depth 


Not reported 


NAS 


MIGS-4.4 


Altitude 


Not reported 


NAS 



Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a di- 
rect report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly ob- 
served for the living, isolated sample, but based on a generally accepted property for the spe- 
cies, or anecdotal evidence). These evidence codes are from the Gene Ontology project [18]. 
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Ensifer meliloti GVPV12 (Gi08912) 
Ensifer meliloti AK83 (GcOl 81 0)* 
Ensifer meliloti Sm1021 (Gc00059)* 
Ensifer meliloti SM 1 1 (CP001 830 Gc01 686)* 
Ensifer meliloti Mlalz-1 (Gi08913) 
Ensifer meliloti RRI128 (Gi08915) 
£ns/fer meWoh' AK58 (Gi07577) 
Ensifer meliloti MVII-I (Gi08914) 
Ensifer meliloti CIAM1775 (Gi08844) 
Ensifer meliloti LMG 6133T(X67222) 
Ensifer meliloti WSM 1 022 (Gi089 1 6) 
— £/?s/fer meWof/ 4H41 (Gi0891 1 ) 

Ensifer medicae WSM419 (Gc00590)* 
Ensifer medicae WSM1369 (Gi08907) 
Ensifer medicae WSM1 1 1 5 (Gi08906) 
Ensifer medicae Di28 (Gi08905) 
Ensifer medicae WSM244 (Gi08916) 
Ensifer medicae WSM4191 (Gi08903) 
Ensifer medicae A321 t (L39882) 
Ensifer arboris LMG 14919^ (Gi08822) (syn: HAMBI 1552^) 
Ensifer saheli LMG 7837^ (X68390) 



94 



71 



51 



62 



Ensifer kostiense LMG 19227^ (AMI 81 748) 



Ensifer sp. TW10 (Gi08835) 
Ensifer sp. WSM 1721 (Gi08904) 



76 



Ensifer chiapanecum ITTG S70 (EU286550) 

Ensifer mexicanum ITTG R7t (DQ41 1930) 
Ensifer terangae LMG 7834^ (X68388) 



0.002 

Figure 2. Phylogenetic tree showing the relationship of Ensifer arboris LMG 14919^ (shown in bold print) to other 
Ensifer spp. in the order Rhizobiales based on aligned sequences of the 16S rRNA gene (1,290 bp internal region). 
All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using 
MEGA, version 5 [22]. The tree was built using the Maximum-Likelihood method with the General Time Reversi- 
ble model [23]. Bootstrap analysis [24] with 500 replicates was performed to assess the support of the clusters. 
Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession 
number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [25]. Pub- 
lished genomes are indicated with an asterisk. 
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Symbiotaxonomy 

E. arbohs LMG 14919T was initially shown to form 
nodules (Nod+) and fix nitrogen (Fix+) with two 
leguminous tree species, P. chilensis and A. Senegal. 
It was unable to elicit nodules on the herbaceous 
perennials Macroptilium atropurpureum, Trifolium 
pratense, Medicago sativa, Lotus comiculatus and 
Galega orientnlis [5]. The symbiotic properties of 
this strain in seedlings of Acacia and Prosopis spp. 
in Sudan and Senegal have been reported in detail 
[6]. Indeterminate nodules are induced, mainly on 
the lateral roots either in clusters or individually. 
Young nodules are spherical and later become 
elongated and are commonly branched. LMG 
14919T (=HAMBI 1552) was shown to nodulate 
and fix nitrogen in seedlings of African yl. mellifera, 
A. niIotica,A oerfota [synonym A nubica],A. Sene- 
gal, A. seyal, A. sieberiana, A. tortilis subsp. 
raddiana, Latin American A. angustissima, P. 
chilensis and P. pallida, and Afro-Asian P. cineraria. 
It also effectively nodulates with Latin-American 
introductions of P. chilensis and P. juliflora in Afri- 
ca [6]. It induced small ineffective nodules on Aus- 
tralianyl. holosericea and African P. africana [6]. 

Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its environmental and agricultural rele- 
vance to issues in global carbon cycling, alterna- 
tive energy production, and biogeochemical 



importance, and is part of the Community Se- 
quencing Program at the U.S. Department of Ener- 
gy, Joint Genome Institute (JGI) for projects of rel- 
evance to agency missions. The genome project is 
deposited in the Genomes OnLine Database [25] 
and an improved-high-quality-draft genome se- 
quence in IMG. Sequencing, finishing and annota- 
tion were performed by the JGI. A summary of the 
project information is shown in Table 2. 

Growth conditions and DNA isolation 

E. arboris LMG 14919T was cultured to mid loga- 
rithmic phase in 60 ml of TY rich medium on a gy- 
ratory shaker at 28°C [28]. DNA was isolated from 
the cells using a CTAB (Cetyl trimethyl ammonium 
bromide] bacterial genomic DNA isolation method 
[29]. 

Genome sequencing and assembly 

The genome of Ensifer arboris LMG 149191" ^v^s 
sequenced at the Joint Genome Institute (JGI] us- 
ing lUumina technology [30]. An lUumina short- 
insert paired-end library with an average insert 
size of 270 bp generated 19,256,666 reads and an 
lUumina long-insert paired-end library with an 
average insert size of 9,232.94 +/- 2,530.88 bp 
generated 1,365,298 reads totaling 3,09 3.3 Mbp of 
lUumina data. All general aspects of library con- 
struction and sequencing performed at the JGI can 
be found at the JGI user home. 



Table 2. Genome sequencing project information for E. arboris LMG 1 491 9^. 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Improved high-quality draft 


MIGS-28 


Libraries used 


lllumina Standard (short PE) and lllumina CLIP (long PE) library 


MIGS-29 


Sequencing platforms 


lllumina HiSeq2000 


MIGS-31. 2 


Sequencing coverage 


lllumina: 448x 


MIGS-30 


Assemblers 


Velvet version 1.1.05; Allpaths-LG version r38445 


MIGS-32 


Gene calling methods 


Prodigal 1.4, GenePRlMP 




GenBank 


ATYBOOOOOOOO 




GenBank release date 


July 15, 2013 




GOLD ID 


Gi08822 




NCBI project ID 


74465 




Database: IMG 


2512047086 




Project relevance 


Symbiotic N2 fixation, agriculture 
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The initial draft assembly contained 27 contigs in 
9 scaffolds. The initial draft data was assembled 
with AUpaths, version r38445, and the consensus 
was computationally shredded into 10 Kbp over- 
lapping fake reads (shreds). The lUumina draft 
data was also assembled with Velvet, version 
1.1.05 [31], and the consensus sequences were 
computationally shredded into 1.5 Kbp overlap- 
ping fake reads (shreds). The Illumina draft data 
was assembled again with Velvet using the shreds 
from the first Velvet assembly to guide the next 
assembly. The consensus from the second VELVET 
assembly was shredded into 1.5 Kbp overlapping 
fake reads. The fake reads from the Allpaths as- 
sembly and both Velvet assemblies and a subset of 
the Illumina CLIP paired-end reads were assem- 
bled using parallel phrap, version SPS 4.24 (High 
Performance Software, LLC). Possible mis- 
assemblies were corrected with manual editing in 
Consed [32-34]. Gap closure was accomplished 
using repeat resolution software (Wei Gu, un- 
published), and sequencing of bridging PCR frag- 
ments using Sanger (unpublished. Cliff Han) tech- 
nology. For the improved high quality draft, one 
round of manual/wet lab finishing was completed. 
A total of 46 additional sequencing reactions, were 
completed to close gaps and to raise the quality of 
the final sequence. The estimated total size of the 
genome is 6.9 Mbp and the final assembly is based 
on 3,093.3 Mbp of Illumina draft data, which pro- 
vides an average of 448x coverage of the genome. 



Genome annotation 

Genes were identified using Prodigal [35] as part of 
the DOE-JGI annotation pipeline [36] followed by a 
round of manual curation using the JGI GenePRIMP 
pipeline [37]. The predicted CDSs were translated 
and used to search the National Center for Biotech- 
nology Information (NCBI) non-redundant database, 
UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and 
InterPro databases These data sources were com- 
bined to assert a product description for each pre- 
dicted proteia Non-protdn coding genes and mis- 
cellaneous features were predicted using tRNAscan- 
SE [38], RNAMMer [39], searches against models of 
the ribosomal RNA genes built from SILVA [40], 
Rfam [41], TMHMM [42], and SignalP [43]. Addition- 
al gene prediction analysis and manual functional 
annotation was performed within the Integrated 
Microbial Genomes (IMG-ER) platform [44]. 

Genome properties 

The genome is 6,850,303 nucleotides with 62.02% 
GC content (Table 3) and comprised of 7 scaffolds 
(Figure 3) of 12 contigs. From a total of 6,545 
genes, 6,461 were protein encoding and 84 RNA 
only encoding genes. The majority of genes 
(80.78%) were assigned a putative function whilst 
the remaining genes were annotated as hypothet- 
ical. The distribution of genes into COGs functional 
categories is presented in Table 4. 



Table 3. Genome Statistics for Ensifer arboris LMG 14919 



Attribute Value % of Total 



Genome size (bp) 


6,850,303 


100.00 


DNA coding region (bp) 


5,921,899 


86.45 


DNA G+C content (bp) 


4,248,771 


62.02 


Number of scaffolds 


7 




Number of contigs 


12 




Total gene 


6,545 


100.00 


RNA genes 


84 


1.28 


rRNA operons 


3 


0.05 


Protein-coding genes 


6,461 


98.72 


Genes with function prediction 


5,287 


80.78 


Genes assigned to COGs 


5,233 


79.95 


Genes assigned Pfam domains 


5,438 


83.09 


Genes with signal peptides 


588 


8.98 


Genes with transmembrane helices 


1,456 


22.25 


CRISPR repeats 


0 
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COG Code 


COG Function Definition 




[A] 


RNA processing and modification 




[B] 


Chromatin structure and dynamics 


■ 


IC) 


Energy production and conversion 




(D) 


Cell cycle control, cell division, chromosome partitioning 


■ 


[E] 


Ammo acid transport and metabolism 


■ 


[F] 


Nucleotide transport and metabolism 


■ 


(61 


Cart>ohydrate transport and metabolism 


■ 


[HI 


Coenzyme transport and metabolism 


■ 


(11 


Lipid transport and metabolism 


■ 


(Jl 


Translation, nbosomal structure and biogenesis 




(K] 


Transcription 




(L) 


Replication, recombination and repair 


■ 


(Ml 


Cell wail/membrane/envelope biogenesis 




(Nl 


Cell motility 


■ 


(01 


Posttranslational modification, protein turnover, chaperones 


■ 


(PI 


Inorganic Ion transport and metabolism 




(Ql 


Secondary metabolites biosynthesis, transport and catabolism 


■ 


(Rl 


General function prediction only 


■ 


(SI 


Function unknown 


■ 


(T] 


Signal transduction mechanisms 




(Ul 


Intracellular trafficlclng, secretion and vesicular transport 




(VI 


Defense mechanisms 




m 


Extracellular structures 


■ 


(Yl 


Nuclear structure 




(Zl 


Cytoskeleton 


■ 


(NA) 


Not Assigned 



Figure 3. Graphical map of the genome of Ensifer arboris LMG 14919^ showing the seven lai^est scaffolds. 
From bottom to the top of each scaffold: Genes on forv^ard strand (color by COG categories as denoted by 
the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, 
other RNAs black), GC content, GC skew. 
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Table 4. Number of protein coding genes of Ensifer arboris LMG 1491 9^ associated with 

the general COG functional categories. 



Code 


Value 


% age 


Description 


J 


195 


3.35 


Translation, ribosomal structure and biogenesis 


A 


0 


0.00 


RNA processing and modification 


K 


510 


8.76 


Transcription 


L 


212 


3.64 


Replication, recombination and repair 


B 


1 


0.02 


Chromatin structure and dynamics 


D 


49 


0.84 


Cell cycle control, mitosis and meiosis 


Y 


0 


0.00 


Nuclear structure 


V 


60 


1.03 


Defense mechanisms 


T 


248 


4.26 


Signal transduction mechanisms 


M 


2 74 


4.71 


Cell wall/membrane biogenesis 


N 


77 


1.32 


Cell motility 


Z 


0 


0.00 


Cytoskeleton 


w 


0 


0.00 


Extracellular structures 


u 


122 


2.10 


1 . Ill . i'f' I ■ 1 . ■ 

Intracellulartraftickmg and secretion 


o 


185 


3.18 


Posttranslational modification, protein turnover, chaperones 


c 


349 


6.00 


Energy production conversion 


G 


598 


10.2 7 


1 1 1 ■ ■ ■ 1 ill* 

Carbohydrate transport and metabolism 


E 


653 


11.22 


Amino acid transport metabolism 


F 


104 


1.79 


Nucleotide transport and metabolism 


H 


201 


3.45 


Coenzyme transport and metabolism 


1 


205 


3. 52 


Lipid transport and metabolism 


P 


292 


5.02 


Inorganic ion transport and metabolism 


Q 


182 


3.13 


Secondary metabolite biosynthesis, transport and catabolism 


R 


721 


12.39 


General function prediction only 


S 


582 


10.00 


Function unknown 




1,312 


20.05 


Not in COGS 
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